AITopics | Mahwah

The performance and usability of Large-Language Models (LLMs) are driving their use in explanation generation tasks. However, despite their widespread adoption, LLM explanations have been found to be unreliable, making it difficult for users to distinguish good from bad explanations. To address this issue, we present Rubrik's CUBE, an education-inspired rubric and a dataset of 26k explanations, written and later quality-annotated using the rubric by both humans and six open- and closed-source LLMs. The CUBE dataset focuses on two reasoning and two language tasks, providing the necessary diversity for us to effectively test our proposed rubric. Using Rubrik, we find that explanations are influenced by both task and perceived difficulty. Low quality stems primarily from a lack of conciseness in LLM-generated explanations, rather than cohesion and word choice. The full dataset, rubric, and code will be made available upon acceptance.

explanation, large language model, machine learning, (19 more...)

arXiv.org Artificial Intelligence

2503.23899

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Italy > Tuscany > Florence (0.04)
(20 more...)

Genre:

Overview (0.92)
Research Report > Experimental Study (0.92)
Research Report > New Finding (0.67)

Industry:

Health & Medicine (1.00)
Education > Educational Setting (1.00)
Education > Curriculum > Subject-Specific Education (1.00)
Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Autonomous Learning with High-Dimensional Computing Architecture Similar to von Neumann's

Kanerva, Pentti

arXiv.org Artificial IntelligenceMar-30-2025

We model human and animal learning by computing with high-dimensional vectors (H = 10,000 for example). The architecture resembles traditional (von Neumann) computing with numbers, but the instructions refer to vectors and operate on them in superposition. The architecture includes a high-capacity memory for vectors, analogue of the random-access memory (RAM) for numbers. The model's ability to learn from data reminds us of deep learning, but with an architecture closer to biology. The architecture agrees with an idea from psychology that human memory and learning involve a short-term working memory and a long-term data store. Neuroscience provides us with a model of the long-term memory, namely, the cortex of the cerebellum. With roots in psychology, biology, and traditional computing, a theory of computing with vectors can help us understand how brains compute. Application to learning by robots seems inevitable, but there is likely to be more, including language. Ultimately we want to compute with no more material and energy than used by brains. To that end, we need a mathematical theory that agrees with psychology and biology, and is suitable for nanotechnology. We also need to exercise the theory in large-scale experiments. Computing with vectors is described here in terms familiar to us from traditional computing with numbers.

artificial intelligence, machine learning, vector, (18 more...)

arXiv.org Artificial Intelligence

2503.23608

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
North America > United States > New York (0.04)
Europe > United Kingdom > Wales (0.04)
(10 more...)

Genre: Research Report (0.50)

Industry:

Health & Medicine > Therapeutic Area > Neurology (0.67)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.48)

Add feedback

Cyborg Data: Merging Human with AI Generated Training Data

North, Kai, Ormerod, Christopher

arXiv.org Artificial IntelligenceMar-26-2025

Automated scoring (AS) systems used in large-scale assessment have traditionally used small statistical models that require a large quantity of hand-scored data to make accurate predictions, which can be time-consuming and costly. Generative Large Language Models are trained on many tasks and have shown impressive abilities to generalize to new tasks with little to no data. While these models require substantially more computational power to make predictions, they still require some fine-tuning to meet operational standards. Evidence suggests that these models can exceed human-human levels of agreement even when fine-tuned on small amounts of data. With this in mind, we propose a model distillation pipeline in which a large generative model, a Teacher, teaches a much smaller model, a Student. The Teacher, trained on a small subset of the training data, is used to provide scores on the remaining training data, which is then used to train the Student. We call the resulting dataset "Cyborg Data", as it combines human and machine-scored responses. Our findings show that Student models trained on "Cyborg Data" show performance comparable to training on the entire dataset, while only requiring 10% of the original hand-scored data.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2503.22736

Country:

North America > United States > Texas > Travis County > Austin (0.04)
North America > United States > New Jersey > Bergen County > Mahwah (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education > Educational Technology > Educational Software > Computer-Aided Assessment (0.67)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

General Scales Unlock AI Evaluation with Explanatory and Predictive Power

Zhou, Lexin, Pacchiardi, Lorenzo, Martínez-Plumed, Fernando, Collins, Katherine M., Moros-Daval, Yael, Zhang, Seraphina, Zhao, Qinlin, Huang, Yitian, Sun, Luning, Prunty, Jonathan E., Li, Zongqian, Sánchez-García, Pablo, Chen, Kexin Jiang, Casares, Pablo A. M., Zu, Jiyun, Burden, John, Mehrbakhsh, Behzad, Stillwell, David, Cebrian, Manuel, Wang, Jindong, Henderson, Peter, Wu, Sherry Tongshuang, Kyllonen, Patrick C., Cheke, Lucy, Xie, Xing, Hernández-Orallo, José

arXiv.org Artificial IntelligenceMar-15-2025

Ensuring safe and effective use of AI requires understanding and anticipating its performance on novel tasks, from advanced scientific challenges to transformed workplace activities. So far, benchmarking has guided progress in AI, but it has offered limited explanatory and predictive power for general-purpose AI systems, given the low transferability across diverse tasks. In this paper, we introduce general scales for AI evaluation that can explain what common AI benchmarks really measure, extract ability profiles of AI systems, and predict their performance for new task instances, in- and out-of-distribution. Our fully-automated methodology builds on 18 newly-crafted rubrics that place instance demands on general scales that do not saturate. Illustrated for 15 large language models and 63 tasks, high explanatory power is unleashed from inspecting the demand and ability profiles, bringing insights on the sensitivity and specificity exhibited by different benchmarks, and how knowledge, metacognition and reasoning are affected by model size, chain-of-thought and distillation. Surprisingly, high predictive power at the instance level becomes possible using these demand levels, providing superior estimates over black-box baseline predictors based on embeddings or finetuning, especially in out-of-distribution settings (new tasks and new benchmarks). The scales, rubrics, battery, techniques and results presented here represent a major step for AI evaluation, underpinning the reliable deployment of AI in the years ahead. (Collaborative platform: https://kinds-of-intelligence-cfi.github.io/ADELE.)

data mining, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2503.06378

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.28)
Europe > Austria > Vienna (0.13)
Europe > France (0.04)
(19 more...)

Genre:

Instructional Material (1.00)
Questionnaire & Opinion Survey (0.92)
Overview (0.92)
(2 more...)

Industry:

Leisure & Entertainment > Sports (1.00)
Law (1.00)
Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
(12 more...)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
(6 more...)

Add feedback

Do Construction Distributions Shape Formal Language Learning In German BabyLMs?

Bunzeck, Bastian, Duran, Daniel, Zarrieß, Sina

arXiv.org Artificial IntelligenceMar-14-2025

We analyze the influence of utterance-level construction distributions in German child-directed speech on the resulting formal linguistic competence and the underlying learning trajectories for small language models trained on a novel collection of developmentally plausible language data for German. We find that trajectories are surprisingly robust for markedly different distributions of constructions in the training data, which have little effect on final accuracies and almost no effect on global learning trajectories. While syntax learning benefits from more complex utterances, lexical learning culminates in better scores with more fragmentary data. We argue that LMs trained on developmentally plausible data can contribute to debates on how rich or impoverished linguistic stimuli actually are.

computational linguistic, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2503.11593

Country:

Asia > Middle East > UAE > Abu Dhabi Emirate > Abu Dhabi (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(13 more...)

Genre: Research Report (1.00)

Industry: Education > Curriculum > Subject-Specific Education (0.41)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.82)

Add feedback

Sensemaking in Novel Environments: How Human Cognition Can Inform Artificial Agents

Patterson, Robert E., Buccello-Stout, Regina, Frame, Mary E., Maresca, Anna M., Nelson, Justin, Acker-Mills, Barbara, Curtis, Erica, Culbertson, Jared, Schmidt, Kevin, Clouse, Scott, Rogers, Steve

arXiv.org Artificial IntelligenceMar-10-2025

One of the most vital cognitive skills to possess is the ability to make sense of objects, events, and situations in the world. In the current paper, we offer an approach for creating artificially intelligent agents with the capacity for sensemaking in novel environments. Objectives: to present several key ideas: (1) a novel unified conceptual framework for sensemaking (which includes the existence of sign relations embedded within and across frames); (2) interaction among various content-addressable, distributed-knowledge structures via shared attributes (whose net response would represent a synthesized object, event, or situation serving as a sign for sensemaking in a novel environment). Findings: we suggest that attributes across memories can be shared and recombined in novel ways to create synthesized signs, which can denote certain outcomes in novel environments (i.e., sensemaking).

cognition, representation, sign relation, (15 more...)

arXiv.org Artificial Intelligence

2503.07783

Country:

North America > Canada > Ontario > Toronto (0.28)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.14)
(16 more...)

Genre: Research Report (1.00)

Industry:

Health & Medicine > Therapeutic Area > Neurology (1.00)
Government > Regional Government > North America Government > United States Government (1.00)
Education (1.00)
(4 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(2 more...)

Add feedback

Deictic Codes, Demonstratives, and Reference: A Step Toward Solving the Grounding Problem

Raftopoulos, Athanassios, Müller, Vincent C.

arXiv.org Artificial IntelligenceMar-5-2025

In this paper we address the issue of grounding for experiential concepts. Given that perceptual demonstratives are a basic form of such concepts, we examine ways of fixing the referents of such demonstratives. To avoid 'encodingism', that is, relating representations to representations, we postulate that the process of reference fixing must be bottom-up and nonconceptual, so that it can break the circle of conceptual content and touch the world. For that purpose, an appropriate causal relation between representations and the world is needed. We claim that this relation is provided by spatial and object-centered attention that leads to the formation of object files through the function of deictic acts. This entire causal process takes place at a pre-conceptual level, meeting the requirement for a solution to the grounding problem. Finally we claim that our account captures fundamental insights in Putnam's and Kripke's work on "new" reference.

individuation, information, referent, (16 more...)

arXiv.org Artificial Intelligence

2503.03495

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
North America > United States > New York (0.04)
North America > United States > New Jersey > Bergen County > Mahwah (0.04)
(5 more...)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning (0.48)

Add feedback